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I.  Introduction  and  Summary 


A  controversy  currently  exists  among  statisticians  involving  the  foun¬ 
dations  for  inference  based  on  a  sample  from  a  finite  population.  This 
controversy  centers  on  the  role  of  the  sampling  design  and  involves  funda¬ 
mental  questions  in  statistical  theory.  There  are  two  main  viewpoints.  In 
the  classical  or  traditional  approach,  the  sampling  design  plays  a  major 
role  in  inference  since  this  approach  is  based  on  the  Randomization  Princi¬ 
ple.  Attention  usually  focuses  on  statistical  properties  of  estimators 
(e.g.  bias,  mean  square  error)  relative  to  a  given  design.  In  the  model- 
based  approach,  inference  is  made  within  a  model  framework,  a  relatively 
new  idea  in  survey  sampling  yet  a  very  common  one  in  other  statistical  pro¬ 
cedures.  In  this  approach  the  main  concern  is  optimality  of  estimators 
with  respect  to  the  given  model,  the  sampling  design  playing  a  secondary 
role. 

In  this  paper  we  describe  the  two  approaches  and  discuss  their  advan¬ 
tages  and  disadvantages.  Secondly,  a  review  of  some  of  the  work  of  Richard 
Royall  is  given,  Royall  being  one  of  the  staunchest  supporters  of  the  mod¬ 
el-based  approach.  Lastly  we  describe  a  sampling  procedure  called  the 
"basket  method"  and  examine  its  relationship  to  the  two  approaches. 

II.  The  Problem  and  the  Two  Approaches 
A.  The  Problem 

The  basic  problem  in  survey  sampling  is  the  following:  A  population 
of  interest  contains  N  units,  labelled  1,  ...,  N.  Associated  with  each 
unit  is  an  unknown  value  y^.  The  vector  y  =  (y^,  ...,  y^)  may  be  consid- 


ered  a  parameter  of  the  population  with  interest  usually  focusing  on  some 
function  of  y  such  as  the  population  total 


He  will  consider  problems  in  which  an  auxiliary  value  x^,  usually  a 
size  value,  is  known  for  each  unit.  Presumably  the  x's  contain  some  infor¬ 
mation  about  the  y's.  That  informational  relationship  is  characterized  and 
exploited  for  inference  in  the  model-based  approach  but  ignored  in  the 
classical  approach.  A  sample  from  the  population  is  selected,  the  y's  are 
observed  and  used  to  estimate  the  parameter  of  interest  with  the  x's  possi¬ 
bly  providing  aid  in  selection  and/or  estimation. 

B.  Classical  Approach 

1 .  Description 

In  the  classical  approach,  the  y's  are  treated  as  fixed  but  unknown 
constants.  A  sample  s  of  size  n  is  selected  according  to  some  sampling 
design  P.  In  mathematical  terms,  P  is  a  probability  function  on  the  col¬ 
lection  S  of  all  subsets  of  size  n  which  can  be  formed  from  the  indices  (1, 

2,  . . . ,  N}  with  P(s)  denoting  the  probability  of  selecting  those  units 
whose  labels  are  in  s. 

The  data  d  =  {(i,y^),  i  e  s}  is  observed  and  T  is  estimated  with  some 

A 

estimator  T.  Note  that  the  labels  are  considered  as  part  of  the  data  since 
many  estimators  in  survey  sampling  depend  on  the  labels.  Properties  such 
as  bias  and  MSE  (mean  squared  error)  are  based  on  the  probability  distribu¬ 
tion  generated  by  the  sampling  design  P,  viz. 

bias  =  E  [T  -  T]  =  E  P(s)(T  -  T) 

S 

MSE  =  E  [(T  -  T)2]  =  Z  P(s) (T  -  T)2 

S 

3 


2.  Estimators  and  Designs 


Let 


y  =  I  y./n  and  x  =  I  x./n 

5  1  5  1 

s  s 

denote  the  sample  mean  of  the  y's  and  x's,  respectively.  Let 


be  the  population  total  of  the  x's  and 

N 

X  =  I  "  x./N 

1=1  i 

the  population  mean.  Finally  let  11^  be  the  probability  that  unit  i  is 
selected  in  the  sample  for  any  arbitrary  design  P. 

Some  common  sampling  designs  and  estimators  are  given  below: 

Designs : 

a.  simple  random  sampling  (SRS): 

]I .  =  -S-  for  each  i  *  1,  . . . ,  N 

1  N 

b.  probability  proportional  to  size  (PPSX) : 

nx. 

IL  =  ..r;1  (available  only  if  nx.  <  NX  for  all  i) 

1  NX  1 

c.  probability  proportional  to  aggregate  size  (PPAS): 

PPAS(s)  =  Z  xi/N'X  N'  * 

Estimators ; 

A  _ 

a.  T  *  Ny  (expansion  estimator) 

Tr  *  (Z  yi/Z  xi)  xk  =  (ys/5?s)  *  Zk”x  xR  (ratio  estimator) 

s  s 

*  N  1 

THT  =  2k=l  %  J  ^i^i^  xk  (Horvitz- Thompson  estimator) 

s 

A 

A  sampling  design  P  combined  with  an  estimator  T  is  called  a  sampling 

A 

strategy  and  denoted  by  the  pair  (P,T).  Consideration  is  usually  given  to 

A  A 

strategies  (P,T)  in  which  T  is  unbiased  with  respect  to  the  design  P 


iL  iL'rt  iL 


(called  a  P-unbiased  strategy).  It  is  well  known  that  (T_,  SRS),  (T,_, 

£  HT 

PPSX)  and  (T  ,  PPAS)  are  P-unbiased  strategies. 

K 

3.  Arguments  For  and  Against 

"Model  robustness"  in  statistics  refers  to  a  procedure  whose  perform¬ 
ance  does  not  seriously  deteriorate  under  certain  types  of  departures  from 
the  assumed  model.  Proponents  of  the  classical  approach  argue  that  design 
based  inference  is  robust  in  the  sense  that  no  probabilistic  assumptions 

A 

need  be  made.  The  estimator  T£  is  unbiased  under  SRS  no  matter  how  the  y's 
are  distributed.  They  also  argue  that  the  use  of  randomization  in  select¬ 
ing  samples  averages  out  effects  of  unobserved  or  unknown  random  variables 
and  safeguards  against  selection  bias. 

However,  some  statisticians  point  to  "negative  aspects"  of  the  classi¬ 
cal  model,  for  instance,  the  non-existence  of  unbiased  minimum  variance 
estimation  [2] .  Also  the  likelihood  function  arising  from  this  model  is 


informative  only  in  a  trivial  sense  [1].  Given  the  data  d  =  {(i,y^),  i  E 

*  *  * 

s>,  all  N-vectors  y*  =  (y1 ,  ...,  yN)  such  that  y^  =  y^,  i  e  s,  have  the 

same  likelihood,  namely  P(s).  No  unique  maximum  likelihood  estimator 
exists.  When  the  likelihood  principle  is  applied  to  survey  sampling  it 
implies  that  the  sampled  y's  should  give  the  same  inference  no  matter  what 


the  design.  Also  in  the  classical  approach,  s  is  an  ancillary  statistic 
and  thus  any  inference  consistent  with  the  conditionality  principle  should 
be  conditioned  on  s.  But  the  conditional  distribution  of  the  data  given  s 


is  degenerate.  These  aspects  prompted  the  use  of  probability  models  as  a 
basis  for  inference  in  survey  sampling. 


« 


*  *  k"  *  *  *  »  *  » 


*.  A  .%  \  A* 


C.  Model-Based  or  Superpopulation  Approach 
1.  Description 

In  the  model-based  approach,  the  numbers  y  ,  y^  are  treated  as 

realizations  of  random  variables  Y^ ,  . . . ,  Y^  characterized  by  some  model  £. 
Hence  the  population  at  hand  can  itself  be  considered  a  sample  from  a 
" superpopulation" .  In  this  approach  the  design  is  relegated  to  a  secondary 
role  and  emphasis  is  placed  on  estimators  that  are  “good"  with  respect  to 
the  model  no  matter  how  s  was  selected. 

Once  a  sample  s  is  selected  and  the  y's  of  the  sampled  units  observed, 
the  quantity  T  to  be  estimated  can  be  written  as 

T  =  I  y.  +  I  y. ,  the  sum  of  the  known 

X  **  1 

s  s 

y's  plus  the  sum  of  the  values  of  the  random  variables  for  those  units 

not  in  the  sample.  Thus  estimating  T  becomes  a  problem  of  predicting  I  Y.. 

s 

The  sampled  units  provide  information  on  the  model  parameters  in  £  which 

can  be  used  to  predict  I  Y . .  The  predictor  usually  takes  the  form 

s 

T  =  I  yi  +  U, 
s 

where  U  is  the  predictor  for  I  Y . .  Thus  inference  is  based  on  the  model  £, 

s 

contrary  to  the  Randomization  Principle.  Given  a  sample  s,  bias  and  mean 
square  error  (MSE)  are  defined  as 

bias  =  E^[T  -  T] 

MSE  =  E^[(T  -  T)2] 

A 

Under  the  model-based  approach,  a  good  predictor  T  is  one  which  is 
optimal  in  some  sense  under  the  model  no  matter  which  sample  is  selected. 

A  standard  optimality  criterion,  minimum  mean  square  error,  calls  for 


selecting  a  predictor  U  which  minimizes  MSE.  This  criterion  has  received 
exhaustive  attention  in  the  statistics  literature  and  solutions  exist  for  a 

wide  collection  of  models.  For  example,  suppose  =  (x^,  x^,  •••»  x^r) 

with  x^0  =  1  is  a  vector  of  known  auxiliary  information  and  £  is  the  linear 
regression  model  in  which  the  Y's  are  independent  with 

I 

E_[Y.]  =  &n  +  B,x.,  +  •••  +  B  x.  =  X.&  =  U., 

51  0  H1  ll  Fr  lr  1  pi 

Var^ [Yi]  =  o2v. 

1  2 

Here  £  =  (BQ.  Br)  and  a  are  unknown  parameters  and  v^  is  known. 


In  this  case  linear  least  squares  prediction  can  be  applied.  If  we 

A  A 

restrict  T  to  be  linear  in  the  sample  y's  and  ^-unbiased,  i.e.  E^tT]  = 

E^[T]  then  E^[(T-T)2]  is  minimized  by  taking 

A  A 

T  =  T  =  V  v.  +  U* 

£-BLU  1  yi  u 
s  s 

where  U*  is  the  BLU  (best  linear  unbiased)  estimator  of  E  [I  Y.]  obtained 

*  s 

from  generalized  least  squares.  Thus,  for  any  particular  sample  s,  T„  „TIT 

-  £ - oLu 

minimizes  mean  square  error,  but  the  value  of  this  minimum  will,  in  gen¬ 
eral,  depend  on  which  sample  s  is  selected  so  that  the  sample  design  may  be 
important  even  in  the  modal  based  approach.  This  suggests  tailoring  the 
sample  design  to  the  estimator  (in  contrast  to  the  classical  approach  of 
tailoring  the  estimator  to  the  design)  and  often  leads  to  a  degenerate  sam¬ 
pling  design  which  places  all  its  mass  on  the  “best"  sample.  Adherents  to 
the  classical  approach  argue  against  such  a  "purposive"  design  since  ran¬ 


domization  is  eliminated  as  well  as  estimates  of  variance  which  result  from 


randomization. 


2.  Results  of  Royall  et.  al. 


Richard  Royall  has  been  one  of  the  main  advocates  of  the  prediction 
approach  to  survey  sampling.  Some  of  his  work  is  reviewed  below  using  the 
notation  in  [5]. 

Let  $(6q,  <5  ^ ,  ....  6  j  i  v(x))  refer  to  the  polynomial  regression  model 
wherein  ,  . . . ,  Y^  are  independent  random  variables  with 


Y. 

l 


Vo  *  Vi*i  * 


♦  Vjxi  * 


The  6's  are  indicator  variables  allowing  for  the  inclusion  (6^=1)  or 
deletion  (6^=0)  of  the  term  fi^x-5  for  j  =  1,  2,  J.  ei'  are 

2 

independent  random  variables  each  having  mean  0  and  variance  o  ;  v(»)  is  a 
known  function.  Hence  under  £ 

Ej[Y^]  =  6q3q  ^1  ^ixi  ®J^Jxi 

and 

Var^[Yi]  =  o2v(xi) 

As  an  example,  consider  the  model  £^(0,l:x).  Here  Y^ ,  ...,  YN  are 
independent  random  variables  with 


Var5l['ri1  *  °  *i 


B  and  a  unknown. 

Results  under  ^  [6] 

1.  For  any  given  sample  s 


=  Tb  =  (l  yi/Z  x.)  xb 

s  s 


q-BLU  “  AR  ‘  ik=l  1  i'  "  ~i'  “k 


and  for  any  given  design  P, 


(ratio  estimator) 


2«, 


E.  E_  [ ( T  -  T )  ]  =  E  [ (N/n) (N-n)o  X(x  /x_) 


v« v*  l vv*  it *gr^*3T.»qwreT^"rera wn,\^  ;^t^v^.' iw^wn'v r>y^  \m mm\' 


2.  The  optimal  strategy  is  (T  ,  P*)  where 

R 


P*(S)  =  1 


1  if  s  =  s* 


(0  if  s  i  s* 

where  s*  denotes  the  set  of  labels  with  the  largest  x-values. 

A 

3.  TD  can  be  extremely  biased  if  s*  is  used  and  £  is  not  the  true 
K  J. 

model.  For  example  if  ^^(1,1 :x)  is  the  true  model  and  s  =  s*, 
£^[Tr  -  T]  =  N&0(X  -  Xs*)/Xs*  *  0 

Royal  defines  a  "balanced  sample"  s  as  one  whose  moments  match 
those  of  the  population.  Specifically,  s  is  said  to  be  a  "balanced 
sample"  of  order  J,  denoted  s(J),  if 


5(j)  .  x(j) 

s 


j  s  1 ,  •  *  •  t  *1 


where 

X^  =  £  x-?/n,  X<j)  =  Z.N  xj/N 
s  l  1=1  l 

s 

4.  If  s  =  s(J) ,  T_  is  unbiased  under  £(6n,  6,,  ....  6»;  v(x)). 

K  U  1  J 

A  A 

5.  When  a  balanced  sample  s(J)  is  selected,  then  T  =  T =  Ny  , 

£-dLU  £.  S 

the  expansion  estimator,  under  the  model  £(6n,  6,,  ...,  6T:  v(x)). 

U  1  J 

Here 


v(x)  =  l.l0  6.a,x3 


and  aQ,  a^ ,  ...,  are  any  positive  constants. 

Therefore,  Royall  suggests  the  use  of  balanced  samples  to  achieve 
robustness  in  model-based  inference,  giving  up  a  small  amount  of  efficiency 
for  protection  against  model  departure.  He  advises,  however,  the  use  of 
some  sort  of  randomization  in  selecting  balanced  samples  as  a  means  of  bal- 


ancing  on  variables  which  are  not  explicitly  considered  in  the  model.  With 
today's  high  speed  computers  one  could  randomly  choose  from  (approximately) 
balanced  samples  by  selecting  simple  random  samples  until  one  is  obtained 
that  meets  a  pre-assigned  criteria.  Such  an  approach  is  used  by  Royall  and 
Cumberland  in  [4] . 


3.  Robust  Variance  Estimation 

Royall  strongly  believes  that  inference  should  be  model-based  and  con¬ 
ditioned  on  s.  Departures  from  assumed  models  become  a  concern  in  this 
approach  so  Royall  concentrated  his  work  on  robust  model-based  procedures 
such  as  balanced  sampling.  Royall  and  Eberhardt  [5]  have  also  dealt  with 
robust  variance  estimation  procedures. 

Now  for  the  advocate  of  model-based  inference,  the  estimate  of  the 
precision  of  an  estimator  is  the  error  variance 

Var  (T  -  T)  =  E  [(T  -  T)2] 

4  S 

The  error  variance  usually  involves  unknown  parameters  and  thus  must  be 
estimated.  This  estimation  relies  heavily  on  the  variance  structure  of  the 
adopted  model,  which  can  be  biased  if  the  true  model  has  a  different  struc¬ 
ture.  Thus  what  is  needed  is  a  robust  procedure  for  estimating  this  error 
variance.  We  take  the  model  ^  as  an  example. 

As  previously  mentioned,  under  model  £=2^  (0,1 :x),  for  a  given  sample 


aR  Zk=l  (Z  yi/Z  Xi) 


is  £^-BLU  and  the  error  variance  is 

Var  (T  -T)  =  E  [(TR-T)2]  =  02(Zx  /Zx.)  Z-^  x. 


A 

An  estimator  V  of  Varp(T_  -  T)  can  be  obtained  from  weighted  least  squares 
and  is 

VL  =  (N/n)  (N-n)  [  (X  d^/x^/fn-l)]  xgX/xs 
s 

where 

d.  =  y.  -  (y  /x  )x. 

1  ws  s'  1 

V  is  £  -unbiased  for  all  samples  s  but  only  if  Var[Y.]  a  x. .  V  can  be 

LI  XXL 

badly  biased  if  the  model  fails  in  that  respect. 

The  usual  estimator  of  variance  of  IR  when  simple  random  sampling  is 
used  is 

Vc  =  (N/n) (N-n)  I  d2/(n-l) 
s 

Royall  and  Eberhardt  [5]  derived  two  variance  estimators  and 
which  are  model  unbiased  for  all  samples  when  variance  is  proportional  to 
size,  but  which  is  asymptotically  unbiased  for  quite  general  variance 
structures : 

V1  *  vc  <*J*/SJ>(1  -  's^'1 

where 

vj  =  (n-1)'1  I  (Xj  -  Ss)2/xJ 
s 

V2  =  (N/n2)(N-n)(xg-X/Xg)Zd^/[l-(xi/nxs)] 

In  an  empirical  study  of  six  populations  in  which  the  model  might 
apply,  Royall  and  Cumberland  [4]  compare  the  above  estimators.  Their  gen¬ 
eral  conclusions  are  that  and  are  superior  to  Vc  in  simple  random 
samples,  and  even  these  were  reliable  only  in  well-balanced  samples.  The 
estimator  Vc  depends  on  the  value  of  xs  and  is  greatly  biased  if  xs  is  very 
different  from  V,,  V,,  and  V_  are  approximately  equal  when  x  =  X  and 


inference  is  much  better  under  these  samples.  The  estimator  V  ,  although 

L 

unbiased  under  £  for  every  s,  was  not  robust  enough  to  perform  well  in  the 
studied  populations. 

III.  The  "Basket  Method"  for  Selecting  Balanced  Samples. 

A.  Background 

About  the  time  that  Royall  and  Herson  were  developing  the  concepts  of 
robust  estimation  within  the  framework  of  a  model-based  approach  [6] , 
Wallenius  was  investigating  an  application  of  sampling  in  the  area  of  price 
estimation  [8] .  The  scenario  for  this  application  is  described  in  detail 
in  [10].  Briefly,  the  units  comprising  the  population  are  price  proposals 
from  a  sole-source  contractor  with  whom  the  government  is  doing  business. 
Associated  with  the  ith  unit  is  the  contractor's  proposed  price  and  an 
unknown  price  y^  which  will  be  determined  through  a  time  consuming  pro¬ 
cesses  of  government  price  analysis  and  negotiation  with  the  contractor. 

If  we  assume  the  contractor  has  a  fairly  good  idea  of  what  the  latter  price 
should  be,  he  could  set  the  proposed  price  in  accordance  with  any  "padding 
strategy"  which  suited  his  purpose.  Thus  we  are  dealing  with  a  situation 
involving  competition  with  large  sums  of  money  at  stake  in  which  one  of  the 
game  players,  the  contractor,  can  control  the  state  of  nature.  It  is  up  to 
the  other  player,  the  government,  to  carefully  analyze  each  proposal  and 
fully  prepare  for  the  negotiation  phase  in  order  to  avoid  overpayment.  The 
process  can  be  quite  involved  both  technically  and  time-wise.  It  is  fairly 
common  to  find  a  large  backlog  of  proposals  awaiting  analysis  and  negotia¬ 
tion.  Since  processing  this  backlog  in  an  expeditious  manner  is  in  the 


best  interest  of  both  the  contractor  and  the  government,  the  situation 
seemed  appropriate  for  a  statistical  treatment.  The  previously  mentioned 
element  of  competition  required  special  consideration:  most  classical  pro¬ 
cedures  were  vulnerable  to  gamesmanship  and,  by  the  same  token,  it  would  be 
foolish  to  base  inference  on  any  structured  model  for  the  relationship 
between  proposed  and  negotiated  prices.  In  the  next  section  we  describe  a 


statistical  tool  called  the  "basket  method"  developed  to  handle  this  situ¬ 
ation.  It  combines  the  concepts  of  randomization  and  balanced  sampling  by 
partitioning  the  population  into  a  collection  of  balanced  subsets  from 
which  one  is  selected  at  random. 


B.  Description 

Suppose  that  the  population  under  study  is  to  be  partitioned  into  K 
disjoint  subsets  b^  b^,  ...,  bR,  not  necessarily  of  equal  size.  One  sub¬ 
set  is  selected  at  random,  the  y's  observed,  and  T  estimated  on  the  basis 
of  that  sample.  If  it  were  possible  to  accomplish  this  partitioning  in 
such  a  way  that 


N 


l  yi *  i  n  ■  <1/K>  £i=i  n 

bj  bk 


for  every  pair  j  r  k,  then  the  estimator  T_  =  K*ly.  (b  being  the  sample 

B  b  1 


selected  at  random)  would  equal  T  no  matter  which  sample  was  selected. 

But  the  y's  are  unknown  in  advance  so  this  type  of  balance  cannot  be 
expected.  If  we  assume  that  the  y's  are  characterized  by  some  probability 
model  £,  then  a  reasonable  approximation  to  the  above  procedure  would  be  to 
partition  the  units  into  subsets  such  that 


A  A  M  A 

I  Y,  =  I  Y.  =  ( 1/K)  I  ",  Y . 


iv  iih  iv  lytH  jp--  aaflj^iy^ik  tMl 


W*W-*  W| 


for  every  pair  j  i  k  where  Y^  =  E^[Y^3  is  the  predicted  Y^  under  the  model 

A  g 

£.  Then  choose  a  sample  and  estimate  T  with  Tg  =  R  I  Y^.  If  we  let  P 

b 

denote  the  design  that  selects  one  sample  at  random  from  the  K  samples,  and 
assume  the  Y's  are  independent  under  £,  then  it  can  be  shown  that,  with  the 
8  A 

strategy  (P  ,  T_),  the  optimal  allotment  of  units  to  samples  under  5  is  one 

O 


in  which 


A  A 

I  Y.  =  I  Y. 
1  1 

bj  bk 


A  2 

for  every  j  *  k.  Specifically  E..E  _[(T  -  T)  ]  is  minimized  for  such  an 

5  pB  B 

allotment  which  we  shall  call  balancing  on  total  predicted  Y  in  the  sample. 
As  in  the  previous  case,  however,  balancing  on  total  predicted  Y  cannot  be 

A 

carried  out  in  practice  (since  Y^  may  depend  on  unknown  parameters  in  £). 
Fortunately,  for  a  wide  class  of  practical  models  £,  balancing  on  total 
predicted  Y  can  be  achieved  even  though  the  model  parameters  are  unknown. 

Consider  the  model  £  (0,1*  x)  previously  mentioned.  Under  this  model 
E„  [Y.]  »  &x.  where  8  is  unknown  and  x.  known.  The  "basket  method"  can 

i  l  K  l 

achieve  the  balance  on  total  predicted  Y  in  the  sample  under  this  model. 

The  "basket  method"  [10]  is  a  procedure  whereby  the  population  of  N 
units  is  partitioned  into  K  samples  called  "baskets"  of  size  n  =  [N/K]  or  n 
=  [N/K]  +  1  (where  [•]  denotes  the  greatest  integer  function)  such  that 

„  „  1  „  N 

1  xi  2  *  ST  Zi=i  xi 

bsk(e)  1  bsk(f) 

for  any  two  baskets  e  and  f.  A  basket  is  selected  at  random  and  T  is  esti¬ 


mated  by 


T  =  (  I  y./I  x  =  K  I  y 

bsk(e)  Dsk(e)  ~  bsk(e) 


i 


Note  that 


I  E  [Y  ]  =  B  £  x.  =  B  I  x  =  I  E  [Y  ] 
bsk(e)  41  1  bsk(e)  1  bsk(f)  1  bsk(f)  H  1 

for  any  two  baskets  e  and  f  and  the  desired  balance  on  total  predicted  Y  in 
the  sample  is  achieved,  whatever  be  the  value  of  B- 

Briefly  the  basket  formation  algorithm  works  as  follows.  If  a  sample 
size  of  approximately  n  is  desired  then  K  =  [N/n]  baskets  are  formed.  The 
units  are  arranged  in  decreasing  order  on  x  and  labelled  1,  ...,  N  with  the 
unit  having  the  largest  x  value  labelled  1,  2nd  largest  labelled  2  and  so 
on.  Starting  with  the  first  K  units  (K  largest  x's  )  place  one  unit  in 
each  basket.  The  remaining  units  are  partitioned  into  successive  groups  of 
K  units  each.  A  group  of  K  units  is  assigned  to  the  K  baskets  by  the  fol¬ 
lowing  rule:  Compute  the  basket  totals  and  arrange  the  baskets  in  order  of 
increasing  totals  [10].  Then  assign  the  K  units,  one-per-basket,  in 
sequential  fashion  with  the  largest  unassigned  unit  being  placed  in  the 
basket  with  the  smallest  total.  If  N/K  is  not  an  integer  then  the  last 
group  (smallest  x's)  will  not  contain  K  units,  the  result  being  that  some 
baskets  will  have  one  fewer  number  of  units.  As  mentioned  earlier,  actual 
basket  size  is  either  [N/K]  or  [N/K]  +  1.  The  initial  basket  formation 
should  result  in  nearly  equal  basket  totals  on  x,  but  a  swapping  algorithm 
is  used  to  bring  basket  totals  into  even  closer  agreement.  Experience 
gained  by  applying  this  simple  algorithm  to  real  populations  indicates  the 
basket  method  results  in  nearly  identical  basket  totals.  It  is  possible, 
of  course,  to  construct  a  population  for  which  the  basket  method  does  not 
yield  a  good  balance  on  total  x  but  we  have  not  encountered  such  a  situ¬ 
ation  in  practice. 


Empirical  evidence  shows  that  the  basket  formation  algorithm,  while 

designed  to  achieve  good  balance  on  I  x.,  does  surprisingly  well  in 

bsk(e)  1 

balancing  higher  moments  as  well.  This  is  due  to  the  nature  of  the  method. 
Hence  we  may  expect  to  achieve  approximate  balancing  of  total  predicted  Y 
in  the  sample  under  higher  degree  polynomial  models.  For  example,  suppose 

that  E[Y.]  =  S0  +  &1xi  +  &2x^.  If 

2  2 

I  x.  =  I  x.  and  I  x.  =  I  x. 
bsk(e)  bsk(f)  1  bsk(e)  bsk(f)  1 

for  any  pair  of  baskets  e  and  f  then 

I  E[Y.]  *  I  E[Y  ]. 
bsk(e)  bsk(f) 

Since  the  sample  size  n  may  differ  by  one  unit  from  basket  to  basket, 
inclusion  of  an  intercept  term  can  have  a  small  adverse  effect  on  the 
total  predicted  y  balance.  Supposing  basket  f  contains  n  units  and  basket 
e  contains  (n+1)  units, 


I  E(Y . )  *  (n+l)&  +  B  l  x.  &_  Z  x2  =  ft  +  I  E(Y  ). 

bsk(e)  Absk(f)  1  zbsk(f)  bsk(f) 

In  practice,  &n  is  usually  very  small  relative  to  Z  E(Y.)  so  the  prob- 
°  bsk(f)  x 

lem  is  inconsequential. 

Now  suppose  we  have  adopted  a  polynomial  model  £(6-.,  6,  ,  _ _  6T: 

U  1  J 

v(x)),  i.e. 


Yi  *  P<*i>  *  «060  +  5lSlxi  *  •••  ♦  +  *i[''(*i>)1/2 


but  the  true  model  is 


Yi  =  f(x.)  +  Ei[w(Xi)]1/2 

where  f  and  w  are  arbitrary  unknown  functions.  Assuming  f  can  be  ade¬ 
quately  approximated  with  a  Taylor  series  polynomial  p*  through  the  Jth 


term,  we  have 


EtY±]  =  f(xi)  =  p*(x.)  =  e0  +  e1xi  +  ...  +  BjX. 


Even  though  the  coefficients  are  unknown,  the  basket  method  algorithm 

will  result,  approximately,  in 

l  p*(x. )  =  I  p*(x. ) 

bsk(e)  bsk(f) 

so  that,  again, 

I  E[Y . ]  =  Z  E[Y  ] 

bsk(e)  bsk(f) 


for  any  two  baskets  e  and  f. 


C.  Relation  to  Royall's  Work 
Now  since 


„  1  ,  N 

1  xi  =  T  £k=l  Xk 

bsk(e)  K  x  l  k 


for  every  basket  e  we  have 


1  .  N 


?bsk(e)  =  E  Vn  =  nX  Ek=l  Xk  =  (N/nK)X  *  1 

D SK  \6J 


X  if  (N/K)  =  (N/K] 


where  0  i  5  <-.  Thus  x,  ,  ,  . 

n  Dsk(e) 

of  degree  1.  Similarly,  if 

,3  . 


Z  X.  a  Z  X.  J  xi 

bsk(e)  1  bsk(f)  1  K  1-1  l 


|J1±5)X  if  (N/K) / [N/K] 


=  X  and  we  have  a  Royall  "balanced  sample" 


3  1  r  N  J 


for  j  =  2,  . . . ,  J,  then 

=  (a)  .  =(i)  _  s(j) 

osk(e)  '  *bsk(f)  '  * 

and  the  sample  is  a  "balanced  sample  of  degree  J."  As  we  have  seen  previ¬ 
ously,  estimators  which  are  optimal  relative  to  certain  model  assumptions 
have  some  nice  robustness  properties  if  we  take  the  trouble  of  finding  a 
balanced  sample.  The  ratio  estimator  is  unbiased  under  a  number  of  polyno 
mial  models  when  the  sample  is  balanced  and  also  optimal  is  seme  cases. 


The  basket  method  is  also  related  to  some  of  Royall's  work  in  balanc¬ 
ing  stratified  samples.  Let  the  population  of  N  units  be  stratified  into  H 
strata  as  follows:  the  units  whose  x-values  are  smallest  form  stratum 
1,  the  next  smallest  units  form  stratum  2,  and  so  on.  For  h  =  1,  2, 

....  H,  a  sample  sh  of  size  n^  is  selected  from  strata  h  and  the  stratum 
totals  are  estimated  by 

Th  =  (I  yhi/I  *hi)Sk-i  xhk  (ratio  estimator) 

Sh  sh 


M. 

where  I  y^,  I  x^  are  sample  totals  of  y  and  x  in  stratum  h,  and  xhk 

sh  sh 

is  the  total  of  x  in  stratum  h. 

The  population  total  is  estimated  by 


= 


A 


Royall  and  Herson  [7]  have  shown  that  T*  is  unbiased  under  £(0,1:  x) 
and  is  unbiased  under  more  general  polynomial  models  of  degree  J  if  the  s^ 
are  selected  such  that 

5(J)  _  x£>> ,  j  =  i . J;  h  *  1,  ...  H 

Sh 

Such  a  sample  is  called  a  stratified  balanced  sample  of  degree  J  and 


—  1/2 

denoted  by  s*(J).  Royall  and  Herson  have  proven  that  if  n.  a  N,X.  then 

n  n  n 

A 

under  £  =  £(6Q,  6^  •  ••,  x)  the  strategy  [s*(J),  T*j  is  more  efficient 
than  [s(J) ,  TR] . 

If  s  is  a  sample  which  contains  units  from  all  strata,  it  can  be  shown 


that 


A 

T* 


A 


is  optimal  under  a  model  £*  for  which 


where  E£*U^]  =  0  and  E£*tehi^  =  °h'  ^‘e’  s^°Pe  an<*  varianc®  scale  factors 
vary  from  stratum  to  stratum.  If  a  stratified  balanced  sample  is  chosen, 

A  * 

then  T*  is  optimal  under  any  model  of  the  form 

Yhi  =  60h&0h  +  6lh&lhxhi  +  *  *  *  +  6JhPJhxhi  +  chi(xhi)l/ 

A 

This  says  that  T*,  while  designed  to  be  optimal  relative  to  model  £*,  is 
robust  relative  to  piecewise  polynomial  model  departures. 

While  designed  to  generate  approximately  balanced  samples,  the  basket 
method  algorithm  does  a  good  job  in  generating  approximate  stratified  bal¬ 
anced  samples  no  matter  how  the  strata  boundaries  are  defined  as  long  as 
the  population  strata  sizes  are  large  relative  to  K.  When  a  stratum 
size  is  small  or  moderate,  it  is  difficult  (if  not  impossible)  to  select  a 
balanced  sample  of  order  J  using  any  technique.  For  convenience,  define 
strata  sizes  Nh  so  that  N^/K  =  are  integers  for  h  =  1,  2,  . . . ,  H.  (It 
may  happen  that  N^/K  may  not  be  an  integer  but  that  is  more  of  a  nuisance 
than  a  problem  in  practice.)  Then  each  basket  sample  s  is  an  approximate 
stratified  balanced  sample  of  some  order  J  since,  by  the  nature  of  the  bas¬ 
ket  algorithm. 


j  =  1,  ....  J  for  some  J  and  h  =  1,  _ _  H.  Thus 


-(i) 


N 


£  *hi/nh  *  [(i/k>  h*i  4/<vk>1  *  4/Nh  -  k 

h  s. 


N. 


h 


Also  for  s 


T*  =  zh=l  [I  *hi/I:  xhi)  xk=l  xhk] 
sh  sh 

*  £h“l  1(1  yhi/<1/K>  £k"?  V  £k"?  *hk] 


■  K  £h=l  (£  ?hi>  -  K  1  *i 


Since  overall  balance  is  achieved  without  regard  to  strata 

_  1  .  N 

1  Xi  =  K  Xk=l  xk 
s 

and  thus 

T=  K  z  y. 
s 

Thus  two  estimators  which  are  optimal  under  two  different  models 
become  practically  identical,  for  a  basket  produced  sample,  this  estimator 
being  optimal  under  a  variety  of  models  for  this  sample. 


D.  Relation  to  Classical  Approach 
1.  P-unbiasedness  of  the  basket  strategy 

A 

We  will  refer  to  the  pair  (P  ,  T_)  as  the  "basket  strategy".  This 

ooK  K 

strategy  is  approximately  P-unbiased  since 

Ep  <V  '  Cl  PBSK  (bsk('» 

DbK 

■  Cl  [(1/K)<  £w  *i'\ ,  *t>  £k=l  "k1 

bsk(e)  bsk(e) 

=  X  [  ( 1/K)  (  Z  y./(l/K)  l"  xj  I."  xj 


It  could  easily  be  made  exactly  p-unbiased  if  we  changed  P  (bsk(e))  from 

B5K 


N 


1/K  to  I  x./E.  x.  ,  that  is,  use  selection  probabilities  proportional 
bsk(e)  1  K=1  K 


to  basket  totals.  This  would  correspond  to  a  PPAS  plan  over  a  very 
restricted  class  of  samples. 

We  can  also  view  the  population  as  being  made  up  of  K  "macro  units" 
(baskets)  from  which  a  sample  of  size  1  is  to  be  selected.  Associated  with 


K 


macro  unit  k  is  an  unknown  number  T^,  our  goal  being  to  estimate  T  =  Ej.=1 

T^.  Now  the  basket  algorithm  has  the  effect  of  making  the  component  x-val- 
ues  in  each  basket  resemble  the  x-values  in  the  whole  population,  only 
"thinned  out"  somewhat,  so  that  we  should  not  expect  the  individual  x-val¬ 
ues  in  the  selected  basket  to  play  a  role  in  the  estimation  process.  This 
is  exactly  what  happens  when  we  note  that  the  ratio  estimator  (which  uses 
the  auxiliary  information  in  the  x-values)  degenerates  to  the  expansion 
estimator,  that  is 


T  =  (  E  y./  E  x.)  E,  ,  x  =  KTC  =  T  (when  N  =  nK) 
R  i.  1  i.  i/r\  i  k=1  k  f  E  ' 

bsk(f)  bsk(f) 


where  Ttf  is  the  y  total  in  the  randomly  selected  basket.  The  usefulness  of 
the  auxiliary  information  in  the  process  of  estimation  has  been  eliminated 
during  the  process  of  sampling.  But  the  sample  is  random,  so  that  the 


objections  of  the  proponents  of  the  classical  approach  relative  to  purpo¬ 
sive  sampling  do  not  apply  to  the  basket  strategy.  The  auxiliary  informa¬ 
tion  x,  assumed  to  be  relevant  for  inference,  has  been  homogenized  among 
baskets.  Other  variables  assumed  irrelevent,  are  dealt  with  through  the 
randomization  principle.  Thus,  the  basket  strategy  should  satisfy  both  the 
classicists  and  the  advocates  of  the  superpopulation  approach. 


2.  Basket  Sample  as  a  Stratified  Sample 

Inherent  in  the  basket  procedure  is  a  form  of  stratified  sampling. 
Assume  the  population  has  been  ordered  in  decreasing  order  and  labelled  1, 

N.  Let  S^^  denote  the  stratum  of  K  units  whose  x-values  are  largest, 

$2  the  stratum  of  K  units  whose  x-values  are  second  largest,  and  so  on 
until  S  where  n  =  [N/Kl .  S  contains  the  K  +  t  smallest  units  where  t  =  N 
-  nK.  In  other  words,  all  strata  contain  K  units  except  possibly  Sn  which 
may  contain  K  +  t.  If  N/R  is  an  integer,  all  are  of  size  K.  Every  basket 
sample  formed  in  the  initial  process  of  the  algorithm  contains  one  unit 
from  each  of  S^,  ...,  with  possibly  two  units  from  Sr.  The  swapping 
routine  may  upset  this  structure  slightly  by  trying  to  bring  the  totals 
into  better  balance.  Regardless,  each  sample  is  a  stratified  sample, 
although  not  all  stratified  samples  are  possible  due  to  the  fixed  number  of 
baskets  and  the  deterministic  aspect  of  basket  formation  .  Nevertheless, 
the  samples  are  indeed  representative  in  terms  of  the  x-values. 

3.  Comparison  with  Simple  Random  Sampling  and  Stratified  Random  Sampling 

For  simplicity,  assume  that  N  =  nK  and  that  the  population  is  strati¬ 
fied  as  in  section  2  so  that  each  S^  contains  K  units.  The  variances  of 
three  sampling  strategies  will  be  compared 

a.  (SRS ,  Tg):  a  simple  random  sample  s  of  size  n  is  drawn  from  the  pop¬ 

ulation  (without  regard  to  strata)  and  1  is  estimated  by 


A 


A 

b.  (STRS,  T^):  STRS  indicates  stratified  random  sampling.  A  stratified 
random  sample  s  of  size  n  is  selected  by  choosing  one 


unit  at  random  from  each  of  the  n  strata.  The  total  T  is 


estimated  by 

C  =  Cl  Kyh  =  K  Cl  yh 

where  y^  is  the  sample  mean  in  stratum  h  which  equals  y^ 
since  only  one  unit  is  chosen  from  each  stratum, 
c.  (PDC,,,  T_):  the  "basket  method" 

A  A 

Let  Vp(T)  denote  the  variance  of  the  estimator  T  under  the  design  P. 
Then  for  a  specific  population  y  ,  ...,  y^,  it  is  well  known  that 

vsrs(V  =  WN-n>/n>  Ci  (yi-T)2/(N-i) 

and 


VSTRS(TST^  "  K  Zj=l  Zi=l  ^yij  ' 

where  v. .  is  the  i  observation  in  the  jth  strata  and  T.  is  the  jth  stratum 
‘il  3 

mean  of  y.  Also, 


A  M 

v  (T  )  =  — 
BSKV  R'  K 


Ze=l  (ybsk(e) 


since  N  =  nk. 

Direct  comparison  of  VSRS(TE),  VSTRS^ST^'  and  Vbsk^C  dePends  on  the 
properties  of  the  population  at  hand.  There  may  be  specific  populations 
where  each  performs  better  than  the  others.  Therefore,  what  we  will  do  is 
to  regard  the  population  values  y  ,  ...,  y^  as  being  drawn  from  an  infinite 
superpopulation  described  by  a  model.  Hence  the  results  obtained  will  not 
apply  to  any  specific  population  but  to  the  average  of  all  populations  that 
can  be  drawn  from  the  superpopulation.  Comparison  will  be  made  under  two 
model  formulations: 


(i).  Model  I:  E  [Y.]  =  y,  VarT[Y,]  =  o.,  E  (Y„. -y)(Y.-y)  =  0  for  any  pair  i 


Here  we  are  hypothesizing  that  Y  has  no  relationship  with  x  and  thus 


the  ordering  in  the  population  is  essentially  random. 

.  A  modification  of  the  results  in  Konijn  [3]  show  that 


EI^VSRS^TE^  EI(VSTRS  fTST^ 

=  ei<vbsk  IV) 


=  N(N-n)  *—  Z.N-  o2/N 
n  l—l  1 

Thus  in  this  case  the  three  strategies  are  equally  efficient  under  the 
assumed  model.  As  mentioned  before  for  any  one  population  there  may  be 
substantial  differences  between  the  three. 


(ii)  Model  II:  En[Y.]  =  ^ ,  Var^Y.]  =  a\,  En  (VV  (Yj  ~vj  >  =  ° 

Again  slight  modification  of  the  results  in  Konijn  [3]  show  that  under 
this  model 


EII[VSRS  (V]  =  N(N-n)  (1/N)  1.^  q2/N  +  N(N-n)(l/n)  1.^  (y^y)2/ (N-l ) 
N 

where  y  =  y^/N 


EII1V5TRS  <EST>'  =  Ei"l  E,"l  Ei=l  -  50>2/N 


where  y. .  is  the  mean  of  the  random  variable  Y. .  associated  with  the  ith 

lj 


unit  in  stratum  j  and  jl  .  =  y^/K,  and 


EII[VBSK  (V)  =  N(N-n)  (1/n)  oJ/N  +  (N2/K)  (ilbsk{e)  '  h)2 


where  y  ,  .  =  I  y./n,  the  average  of  y.  in  basket  e.  Now  let's  sup- 

dsk(  e ;  i  i 


bsk(e ) 


2  2 

pose  that  y^  =  a  +  fjx^  and  =  o  x^.  Then  substitution  in  the  above  for¬ 


mulas  show  that 


eii^vsrs 
eii^vstrs 
EIltVBSK  <V] 

where 


=  A  +  N(N-n)(6Vn)  zrx  (x.-X)V(N-l) 

=  A  +  (N/n)(K-l)&2  Z.^  Z*,  (x^  -  X  ..)2/(K-1) 

'  *  *  <n2/K>62  E,=l  <W>  -  *>2 

*  k  <since  gb5k{«)  *  E) 


Thus 


A  =  N(N-n) (o  /n)X 


EIltVBSK  E  EIltVSKS  <V> 


and 


eii^vbsk  (tr^  1  eii^vstrs  (tst^ 


The  basket  procedure  is  more  efficient  on  the  average  from  populations 
generated  from  Model  II.  Now  EIT[VSRS  (T£)]  and  En[vSTRS  (TST)]  depend  on 
the  population  variance  of  x  and  the  variance  within  stratum  of  x.  Thus 
determination  as  to  which  is  more  efficient  depends  on  the  particular  x's 
at  hand. 


•Cn'C-C- 
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